-
Notifications
You must be signed in to change notification settings - Fork 1k
modularize optimization internals #7401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7401 +/- ##
==========================================
- Coverage 99.02% 99.01% -0.02%
==========================================
Files 87 87
Lines 16754 16843 +89
==========================================
+ Hits 16591 16677 +86
- Misses 163 166 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Generated via commit 283ba85 Download link for the artifact containing the test results: ↓ atime-results.zip
|
|
I'm also not sure about moving the tests to |
|
@MichaelChirico I'm also not 100% convinced about the new |
I see. I still like the idea of a separate script -- the more we peel out of the behemoth tests.Rraw, the better. "eventually" it would be nice to have most tests live in purpose-made test scripts, IMO. |
| test(2357.2, fread(paste0("file://", f)), DT) | ||
| }) | ||
|
|
||
| # gforce should also work with Map in j #5336 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last idea -- what happens when the grouping column is part of the aggregation in j?
DT[, .(sum(b) - mean(a)), by=b]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the grouping column is part of the aggregation we turn off GForce since it will be in .SDall
Lines 430 to 432 in 8129198
| for (ii in seq.int(from=2L, length.out=length(jsub)-1L)) { | |
| if (!.gforce_ok(jsub[[ii]], SDenv$.SDall, envir)) {GForce = FALSE; break} | |
| } |
| } | ||
|
|
||
| # attempts to optimize j expressions using lapply, GForce, and mean optimizations | ||
| .attempt_optimize = function(jsub, jvnames, sdvars, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really like how clean this is 👍
MichaelChirico
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About halfway done reading the implementation now. Thanks for your patience with the review! I'm really excited for this to get finished :)
|
|
||
| # Optimize expressions using GForce (C-level optimizations) | ||
| # This function replaces functions like mean() with gmean() for fast C implementations | ||
| .optimize_gforce = function(jsub, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that comes to mind seeing such a long signature -- using a "struct" instead of passing individual arguments, e.g.
There may be some possibility to make the code easier to understand if some arguments are grouped or combined.
Not a requirement but something to ponder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think if I we would use structs/lists then we should probably use them for all helpers here, no (for consistency?), e.g. also .optimize_sd_subset, .optimize_c_expr, .optimize_lapply, .optimize_gforce, .optimize_mean and .attempt_optimize.
For .optimize_gforce I can even see the benefit for the long signature but on the other side we run into the problem that arguments might get lost in there...

Mapinstead oflapplyturns GForce off #5336Adds arithmetic for GForce as demanded in #3815 but does not add support for blocks in j like
d[, j={x<-x; .(min(x))}, by=y].